Shapley values are ubiquitous in interpretable Machine Learning due to their strong theoretical background and efficient implementation in the SHAP library. Computing these values previously induced an exponential cost with respect to the number of input features of an opaque model. Now, with efficient implementations such as Interventional TreeSHAP, this exponential burden is alleviated assuming one is explaining ensembles of decision trees. Although Interventional TreeSHAP has risen in popularity, it still lacks a formal proof of how/why it works. We provide such proof with the aim of not only increasing the transparency of the algorithm but also to encourage further development of these ideas. Notably, our proof for Interventional TreeSHAP is easily adapted to Shapley-Taylor indices and one-hot-encoded features.
translated by 谷歌翻译
背景信息:在过去几年中,机器学习(ML)一直是许多创新的核心。然而,包括在所谓的“安全关键”系统中,例如汽车或航空的系统已经被证明是非常具有挑战性的,因为ML的范式转变为ML带来完全改变传统认证方法。目的:本文旨在阐明与ML为基础的安全关键系统认证有关的挑战,以及文献中提出的解决方案,以解决它们,回答问题的问题如何证明基于机器学习的安全关键系统?'方法:我们开展2015年至2020年至2020年之间发布的研究论文的系统文献综述(SLR),涵盖了与ML系统认证有关的主题。总共确定了217篇论文涵盖了主题,被认为是ML认证的主要支柱:鲁棒性,不确定性,解释性,验证,安全强化学习和直接认证。我们分析了每个子场的主要趋势和问题,并提取了提取的论文的总结。结果:单反结果突出了社区对该主题的热情,以及在数据集和模型类型方面缺乏多样性。它还强调需要进一步发展学术界和行业之间的联系,以加深域名研究。最后,它还说明了必须在上面提到的主要支柱之间建立连接的必要性,这些主要柱主要主要研究。结论:我们强调了目前部署的努力,以实现ML基于ML的软件系统,并讨论了一些未来的研究方向。
translated by 谷歌翻译
This short study reformulates the statistical Bayesian learning problem using a quantum mechanics framework. Density operators representing ensembles of pure states of sample wave functions are used in place probability densities. We show that such representation allows to formulate the statistical Bayesian learning problem in different coordinate systems on the sample space. We further show that such representation allows to learn projections of density operators using a kernel trick. In particular, the study highlights that decomposing wave functions rather than probability densities, as it is done in kernel embedding, allows to preserve the nature of probability operators. Results are illustrated with a simple example using discrete orthogonal wavelet transform of density operators.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on the Long Range Graph Benchmark (LRGB) and the TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them.
translated by 谷歌翻译
We study inductive matrix completion (matrix completion with side information) under an i.i.d. subgaussian noise assumption at a low noise regime, with uniform sampling of the entries. We obtain for the first time generalization bounds with the following three properties: (1) they scale like the standard deviation of the noise and in particular approach zero in the exact recovery case; (2) even in the presence of noise, they converge to zero when the sample size approaches infinity; and (3) for a fixed dimension of the side information, they only have a logarithmic dependence on the size of the matrix. Differently from many works in approximate recovery, we present results both for bounded Lipschitz losses and for the absolute loss, with the latter relying on Talagrand-type inequalities. The proofs create a bridge between two approaches to the theoretical analysis of matrix completion, since they consist in a combination of techniques from both the exact recovery literature and the approximate recovery literature.
translated by 谷歌翻译
We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.
translated by 谷歌翻译
When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES, an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity with Novelty, a commonly used diversity metric, and find that Curiosity can generate higher diversity over full episodes without the need for an explicit diversity criterion and lead to multiple policies which find reward.
translated by 谷歌翻译
Deep Neural Networks (DNNs) outshine alternative function approximators in many settings thanks to their modularity in composing any desired differentiable operator. The formed parametrized functional is then tuned to solve a task at hand from simple gradient descent. This modularity comes at the cost of making strict enforcement of constraints on DNNs, e.g. from a priori knowledge of the task, or from desired physical properties, an open challenge. In this paper we propose the first provable affine constraint enforcement method for DNNs that requires minimal changes into a given DNN's forward-pass, that is computationally friendly, and that leaves the optimization of the DNN's parameter to be unconstrained i.e. standard gradient-based method can be employed. Our method does not require any sampling and provably ensures that the DNN fulfills the affine constraint on a given input space's region at any point during training, and testing. We coin this method POLICE, standing for Provably Optimal LInear Constraint Enforcement.
translated by 谷歌翻译
尽管自我监督学习(SSL)方法取得了经验成功,但尚不清楚其表示的哪些特征导致了高下游精度。在这项工作中,我们表征了SSL表示应该满足的属性。具体而言,我们证明了必要和充分的条件,因此,对于给出的数据增强的任何任务,在该表示形式上训练的所需探针(例如,线性或MLP)具有完美的准确性。这些要求导致一个统一的概念框架,用于改善现有的SSL方法并得出新方法。对于对比度学习,我们的框架规定了对以前的方法(例如使用不对称投影头)的简单但重大改进。对于非对比度学习,我们使用框架来得出一个简单新颖的目标。我们所得的SSL算法在标准基准测试上的表现优于基线,包括Imagenet线性探测的SHAV+多螺旋桨。
translated by 谷歌翻译
诊断出红斑的偏头膜(EM)皮肤病变,使用深度学习技术的莱姆病最常见的早期症状可以有效预防长期并发症。现有的基于深度学习的EM识别的作品仅由于缺乏与相关患者数据相关的莱姆病相关图像的数据集,因此仅利用病变图像。医师依靠患者有关皮肤病变背景的信息来确认其诊断。为了协助深度学习模型,根据患者数据计算出的概率分数,这项研究引起了15位医生的意见。对于启发过程,准备了一份与EM相关的问题和可能的答案的问卷。医生为问题的不同答案提供了相对权重。我们使用基于高斯混合物的密度估计将医生评估转换为概率得分。为了引起概率模型验证,我们利用了形式的概念分析和决策树。引起的概率得分可用于使基于图像的深度学习莱姆病预扫描剂稳健。
translated by 谷歌翻译